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Abstract 


An estimation tool for symmetric univariate nonlinear regression is presented. The method is 
based on introducing a nontrivial set of affine coordinates for diffeomorphisms of the real line. 
The main ingredient making the computations possible is the Connes-Moscovici Hopf algebra of 
these affine coordinates. 
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1 Introduction 


Usual univariate regression analysis makes a distinction between the predictor and the 
outcome variable. In some situations, however, a completely symmetric handling of the two 
variables is required. One such example, the main motivation behind this investigation, is the 
equating of educational tests (see von Davier, Holland, & Thayer, 2004, for an introduction to 
test equating). When the same sample of students are taking two different tests (Test A and 
Test B ), there is no natural order of the two tests in the resulting data for the test scores. That 
is, the role of Test A and Test B are interchangeable, and this interchangeability is referred to 
as the symmetry of the data set. Consequently, any model based on this data should reflect 
this symmetry. Ordinary least squaresss linear regression will result, in general, in two different 
regression lines: one when Test A is fitted to Test B and the other is obtained when Test B is 
fitted to Test A. 

For linear regression, there are known symmetric methods: one of them is obtained by 
measuring the distance of the points of the scatter plot and the regression line along line segments 
perpendicular to the regression line (Golub & Loan, 1989; Nievergelt, 1994; Sardelis & Valahas, 
2004). Some statistical advantages of the symmetric view point are detailed in Sardelis and 
Valahas. The method was also found superior to the usual least squaresss approach in the field of 
image reconstruction see (Hamid, Bobick, & Yezzi, 2004; Kennedy, Buxton, & Gibly, 1999) and 
references therein. 

For nonlinear regression, even the family of possible regression functions is a nontrivial 
question. The usual next level of generalization, polynomial regression, is not a good candidate for 
several reasons. First, for a higher order polynomial, the inverse does not always exist. Even when 
it does, it is impossible to find it, in general. Moreover, if the degree of the polynomial is larger 
than 1, the inverse is not going to be a polynomial, thereby prohibiting symmetric handling of the 
data using polynomials exclusively. Considering the larger set of functions containing invertible 
polynomials and their inverses would pose an unsolvable algebraic challenge, in addition to being 
awkward. 

For these reasons, this paper introduces a solution based on diffeomorphisms along with their 
natural affine coordinatization introduced by Connes and Moscovici (1998). A real diffeomorphism 
is a differentiable one-to-one and onto M —> M function with a nonzero derivative. It is easy to 
see that its inverse is also a diffeomorphism. With this large family of functions, symmetry is 
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readily taken into account. The challenge lies in finding suitable subspaces of diffeomorphisms for 
regression and a practical way of handling the inverse of a diffeomorphism. 


2 Preliminaries 

A two-dimensional scatter plot is a finite subset D 0 b s = {(xi,yi) \ i = 1 of 

M 2 . The word scatter plot is used here instead of the usual terminology data to emphasize 
the geometrical nature of our problem. Often one has a model expressed through a family 
of functions T C Function(M, M) and the problem is to find a member / E T so that 
D m = {(xi, f(xi)) | i = 1 and D 0 b s are as closely related as possible. This regression 

function is denoted by Rjr{D 0 b s ) := /. That is, R? is defined as a map from the set of scatter 
plots to P. For example, when the model is given by P n (polynomials up to degree n) and 
closeness is defined by the distance squared, 

N 

d (T) 0 bsi Dm) := 'y ' (Vi p(xi )) , p E P n i 

i =1 

being small, one deals with polynomial least squares regression. 

This paper is concerned with the case when the family of functions Diff(R) + is a subset of 
increasing diffeomorphisms. 1 

Diff(M) + = {/ : M —> M \f is bijection, /^ exists Vn, f > 0} C Diff(M). 


Moreover, the goal is to find the symmetric regression <f> E Diff(R) + for the scatter plot D 0 b s . For 
symmetric regression, use the following: 


Definition 1 The regression map from scatter plots to diffeomorphisms is a symmetric 

regression, if whenever f> = R s ^^ + (D 0 b s ) is the regression on D 0 b s , its inverse is the regression 
on D-Js = {(y*> x i) I * = N}, that is, 


$ ~ Rmffm+( D ° bsS ) 


i-l _ rysymm 

v TO 




(D obs )- 


Now assume that 4> : M —> M is an increasing diffeomorphism of the real line. First, factor the 
diffeomorphism as a composition of a linear function e and a diffeomorphism <p 


<f> = e o <p 


(1) 
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so that <p(0) = 0 and y/(0) = 1. That is, 


e(x) = 0(0) + 0'(O)x, 

0(x) - 0(0) 


<p(z) = 


0'(O) 


( 2 ) 

(3) 


Let G 2 = {<p £ Diff(R) + | 99 ( 0 ) = 0, ^(0) = 1} denote the collection of all diffeomorphisms 
without linear part. 2 

The reason for this decomposition is that now it is possible to define linear and nonlinear 
symmetric regression. That is, the decomposition (1) can be thought of as the first step towards 
defining the degree of a diffeomorphism. 

Following this natural decomposition, this paper first discusses the linear symmetric regression 
and then presents a solution for handling the nonlinear (or G 2 ) part of the problem. 


3 Symmetric Linear Regression 

First consider the case when the fitted function is a line, e(x) = bx + a. It is easy to see that 
the usual vertical least squares solution is symmetric if the correlation of the data is 1: p(D ) = 1. 
In this case the regression is the unique line passing through all data points. If the distance 
between the regression line and points of the scatter plot is measured perpendicularly to the 
regression line, then the resulting linear regression is symmetric. More generally, if the distance 
between a data point and the regression line is measured along a line with slope s(b ) given by a 
symmetric slope function s : R —> R x (R x = R\{0} is the set of nonzero real numbers) depending 
on the slope of the regression line b, then to satisfy the symmetry requirement the distance for the 
inverse should be computed along the line s{l/b). This gives the symmetry condition for s as 


1 


s(b) S V b 


The above mentioned perpendicular solution is obtained by setting 


(4) 


S (6) = - 6 . 


(5) 


For the value s(l) from (4), s 2 (l) = 1 is obtained, that is, s(l) = ±1. Only s(l) = —1, however, is 

a meaningful solution. Note that (4) can be used to obtain symmetric slope functions by setting 

s : (0,1] —> R x arbitrarily with the only restriction s(l) = —1 and defining s(b) = -rW for b > 1. 

S G) 
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Smooth solutions should be infinitely many times differentiable at b = 1, resulting in a series of 
conditions for the derivatives := s^ n ^(l) of s at b = 1: 

si = -1 (6) 

4 = (7) 

4 4) = 3s; 4 +124 3 + i5s; 2 - 44 3) s; + e*; - e4 3) (8) 


The odd degree derivatives s*f " +1 ' > are all free. That is, there are infinitely many analytic solutions 
to (4) governed by the choices for the odd degree derivatives of s at b = 1. Analyticity further 
requires the convergence of the series 


OO 


E 



(b-iy 


(9) 


for any b £ R. 

The perpendicular solution (5) is obtained from the choice 


sf* +1) = —(2 i + 1)!, for all i G N, 


which makes (9) convergent only for 0 < b < 2. Because this paper is looking for a solution in the 
neighborhood of the perpendicular line, this lack of analyticity should always be anticipated. To 
overcome this limitation in a practical setting, one could use the series expansion (9) to define a 
solution over (0,1] and extend it to [1, oo] by (4). 


4 Nonlinear Symmetric Regression 
4-1 Affine Coordinates for Diffeomorphisms 

Many polynomials are diffeomorphisms, but the inverse of a polynomial is rarely a polynomial 
itself. That is why polynomial regression, even with diffeomorphic polynomials, is not a good 
candidate for symmetric regression. After the factorization f = e o tp of a diffeomorphism 0, the 
nonlinear part (p is clearly identified. In practice, this is a two-step process. First, one fits a linear 
symmetric regression e to the scatter plot. Then, if the fit of e is not satisfactory, symmetric 
nonlinear regression is performed on the scatter plot for which the linear part e is removed. 
Nonlinear regression means finding the best fitting tp 6 G 2 diffeomorphism. 
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If an arbitrary diffeomorphism p 6 G2 were allowed during the process, the resulting 
regression would be a function so that H 0 bs = D m . This, however, may render the regression too 
data driven and consequently less sample invariant. Also, there will be infinitely many solutions 
satisfying this equality, so the decision rule would be almost useless. Hence, the degree of the 
diffeomorphism for the regression should be limited, similarly to the polynomial regression case. 

To overcome these problems, this paper follows Connes and Moscovici (1998) and introduces 
affine coordinates for the group G2 C Diff(M) by defining for a diffeomorphism p £ G-2- 

S n (p) := log(^) (n) (0) € R, <p G G 2 . (10) 

It’s possible to locally reconstruct p from these affine coordinates via 

p{x) = f e^ n ' >u d u. ( 11 ) 

Jo 

General theory of affine algebraic groups (Hochschild, 1981) implies that the resulting set of affine 
coordinates carries the structure of a Hopf algebra. 

For the coordinates of the inverse, there is 

5n(p) ■■= Snip- 1 ) = log((¥> -1 y) (n) (0). (12) 

The main advantage of this Hopf algebra-based approach is that there are explicit formulae 
expressing the coordinates of the inverse in terms of the coordinates of the original function. The 
first few are listed here (see the appendix for the first 10): 


Si = 

-Si* 

(13) 

II 

<N 

-S 2 + Si 

(14) 

II 

-S 3 + 4<5i<5 2 - 2 5l 

(15) 


The interested reader should read the appendix for the details of how such expression can be 
derived. 

4-2 Algorithm for Nonlinear Symmetric Regression 

Now, a practical algorithm for computing the nonlinear symmetric regression of a scatter 
plot Hobs is provided. For a general scatter plot, the first step is to find the linear part of the 
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decomposition (1). That is, a function e(x) = bx + a is sought, so that for the transformed scatter 
plot, 

^obs“ lin = e _ 1 (A>bs) = {(Xi, e -\ Vl )) \i = l,...,N} 

The symmetric linear regression would be the diagonal line 

( 16 ) 

It is easy to see that this choice will ensure the uniqueness of the linear part, since (16) means 
that one cannot perform two nontrivial linear symmetric regressions one after another. 

The next step is then to find a symmetric nonlinear regression <p £ G 2 on A closer 

look at the definition of affine coordinates reveals that the affine coordinates are nothing else but 
the terms of the Taylor expansion of the function log(<//). To estimate the coordinates, one has to 
transform first the scatter plot e _ 1 (-D 0 b s ) to this log derivative scale. That is, observed derivatives 
are computed from the scatter plot using finite differences: 

<h := Vi ~ Vi ~ l , Vi = 2,..., n. (17) 

Xi Xi— 1 

The observed log derivatives are then given by li = log(dj) Vi = 2,..., n. For inside points, that 
is when 1 < i < n; there could be another estimate df obtained via averaging the incoming and 
outgoing slopes. To keep the symmetry of the model, one should use the geometric mean: 

df := s/did i+ i, Vi = 2,... ,n - 1. (18) 

Symmetry means that the slopes for a scatter plot D are reciprocal for the slopes computed for 
D~ 1 . It as an easy exercise to see that the arithmetic mean does not respect this property. 

This averaging appears to be useful in resolving the anomaly introduced by the fact that the 
observed derivatives di should correspond to a mean of x t -\ and Xi rather than to either xi or to 
Xi- 1 . This would make symmetric handling of the problem a bit awkward. 

A sort of stabilization could be achieved by extending these new observations by defining 

df ■= \fdi, and df := yfd n -1 • (19) 

The observed log derivatives in this case are given by if = log (df) V i = 1,..., n. The stabilization 
can be thought of as introducing two new points at the end of the scatter plot so that the resulting 
slopes are 1. This is not required for the method here but maybe useful in some applications. In 
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what follows, this paper either assumes that stabilization had been done or that the first and last 
points had been dropped and the data had been reindexed. 

Let if denote the corresponding observed log derivatives that are derived from the inverse 
scatter plot (D^‘ lm ) _1 . By construction, 

Tf = -If. (20) 

The problem then reduces to fitting two polynomials simultaneously with a certain maximum 
degree K. To this end, for a polynomial, 

p{x) = Six + yx 2 H-b ( 21 ) 

with the k-truncated antipode by 

p(x) = 5\x + y x 2 H-b ~i^\ xK ' ( 22 ) 

where 5 m is given as in (13) to (15) and in the appendix. Truncation refers to the fact that even if 
8i = 0 for i > K with some K , the coordinates of the inverse 5i for i > K are not necessarily zero. 
When defining the truncated antipode, those higher order terms are omitted. Also, note that the 
A'-truncated antipode of p is p itself (see the appendix for details): 

p = p. (23) 


Then, using ordinary least squares, the polynomial fit, that is, the vector of parameter 
estimates A = (<5i, £ 2 , ..., 5k), is found by minimizing the function 

n 

£ 2 (5 1 ,6 2 ,...,6 k ) = ^lf-p( Xi )f + (Tf-p( yi )) 2 


7 = 1 


k =1 


k '=1 


The nonlinear regression cp is then obtained via integration 

<p(x) = f X e p ^du. 

Jo 

Note, that the antipode was defined so that 


P 1 (y)= / e p(u) du, 
Jo 


(24) 


(25) 


( 26 ) 
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by neglecting the error originating from the truncation. In (26), the inverse of the diffeomorphism 
is obtained from the antipode of the corresponding log derivative polynomial. That is, the inverse 
always exists, and it can be relatively easily found, unlike in the case of polynomial regression as 
outlined in the introduction. Moreover, by the definition of H 2 and by (20) and (23), 

^ 2 (<Ji, <52, • • •, $k) = ^ 2 (<5i, 82 , ■■■, $k)- (27) 


Note that (27) directly implies the symmetry property of the regression <p. 

For the sake of explicitness, expand (24) in the case of a degree three approximation scheme: 

t 2 (Si,8 2 ,6 3 ) = U 5 lXl + 

i =1 ' ' 

( 19 ,, S2+8 2 2 ^63+46,62-51 3 y 

(~ l i + 8m -2 —Vi -g- Vi) ■ 


(28) 


The estimation consists of finding (<5 i,(>2, 83 ) in the neighborhood of (0,0,0) so that ^( 81 , 62 , 83 ) 
of (28) is minimal. 


5 Conclusion 

The approach to symmetric regression based on diffeomorphisms of the real line and their 
Hopf algebra of affine coordinates were introduced in this paper. This method provides a practical 
way of handling the inverse of the regression function together with the function itself, thereby 
providing a tool to handle symmetric regression. While some preliminary steps towards solving 
the problem in general are presented, the paper should be considered as a research plan rather 
than a report on a finished product. 

Several questions are left open. Some of them are relatively small technical matters. A 
notable example is the proof of the the statements about the linear symmetric slope function 
around (6) through (8). Some of them are potentially difficult questions. An example would be 
the effect of truncation of the antipode in (22). 

Another large topic would be to relate the technique presented here to the usual equating 
methods as applied in current practice (von Davier et al., 2004). 

Also, the method explicitly uses the estimates of the derivatives as derived from the data. 
This works as presented only when the data set is ordered in the sense that the piecewise linear 



function it defines is strictly monotonic. If this is not the case, then a sort of averaging procedure 
should be introduced (in a symmetric fashion, of course) to estimate the slopes. 

Yet another future direction could be to extend the procedure to higher dimensional 
diffeomorphisms via the higher order Connes-Moscovici Hopf algebras. This extension could be 
useful when equating tests utilizing a multidimensional item response theory model. 
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Notes 


1 The use of increasing diffeomorphisms is only to keep the connection with test equating alive. 
As this paper shows, the exact same procedure handles arbitrary diffeomorphisms. 

2 If the diffeonrorphisnr is decreasing, then the linear part will be decreasing and the nonlinear 
part will be increasing; that is, it too will be in in G- 2 - 

3 Results in this appendix are taken from Connes and Moscovici (1998). 
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Appendix 

A.l Connes-Moscovici Hopf Algebra 


For the general theory of Hopf algebras, the reader is referred to Sweedler (1969). The 
one-dimensional Connes-Moscovici Hopf algebra 1) was found by Connes and Moscovici 
(1998) while working on the transverse index theorem of foliations. 3 As an algebra, it is the 
universal enveloping algebra of the Lie algebra generated by X , Y, (5 n )^ =1 subject to the following 
commutation relations: 



[X, Y] = - X , 

(29) 


[X,S n ]= <Wi, 

(30) 


[y, 5„]= nd n , 

(31) 


[dn,5 m ]= 0. 

(32) 

The coproduct, however, is not the usual one for enveloping algebras. It is defined by 


Ay 

= y 01 +1 ® y, 

(33) 

AX 

= X0l + 10A + (5i0y, 

(34) 

AS 1 

= <5i 0 1 + 1 0 <5i , 

(35) 

'4 

<1 

= A[X,S n -!]. 

(36) 

The counit e and the antipode S are 

defined on generators as follows: 


£ 

(X) = e(Y) = e(5 n ) = 0, 

(37) 

S(Y) 

= -y, s(x) = —X + yy, 

(38) 

S(Si) -- 

= -5i, S(5 n ) = S([x,s n - 1 ]). 

(39) 

The maps above extend to 7i( 1) endowing it with a Hopf algebra structure. 

The focus is the antipode 6 n = S(5 n ) of 5 n . In particular, it is shown how these increasingly 

complicated expression can be derived in a systematic manner. From the definition, S(5 1 ) = —<5i. 
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To compute 8 ( 62 ), proceed as follows: 


8 2 = S{8 2 ) = S([X,6i]) 

= -[S(X),S(8 1 )} 
= [-X + S^Sx] 


= [-x,«Ji] + [<5iy,<5i] 


= -5 2 + <5i[T,5 1 ] 

= — < 5 2 + S\ • ( 40 ) 


The interested reader may derive £3 using the same line of argument. Also, it is worthwhile to 
note that Menous (2005) contained explicit formulae for the antipode. 

Even though it is very useful to consider the whole of the Connes-Moscovici Hopf algebra, 
it is also worthwhile to note the commutative sub-Hopf algebra Tis generated by 8 n for n > 0. 
The commutativity of Tis implies that the antipode is involutive: S 2 (a ) = S(a) for all a G Tis- 
This shows, in particular, that S 2 (8i 1 ... 6 ik ) = 8 h ... 8 ik . Hence, for a polynomial p the truncated 
antipode of the truncated antipode is p itself: p = p\ see (23). 


A.2 Antipode of 8 n up to n = 10 


<5i = —, 

$2 = -S 2 + %, 

83 = -83 + 4<5i<5 2 - 28f, 

^4 = 6 8 f — 188 2 8f + 75'3<5i + 4 8 \ — £ 4 , 

<5 5 = -24 8 % + 96(52-5? - 468 3 8f - 52<5|<5i + 11<5 4 (5i + 15<5 2 <5 3 - <5 5 , 
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5 6 = 1205? - 6005 2 5? + 3265 3 5? + 5485?5? - 1015 4 5? - 2715 2 5 3 5 4 + 165 5 5i 

—5252 + 155| + 265 2 5 4 - 5 6 , 

5 7 = —7205? + 43 205 2 5? - 25565 3 5? - 56885?5? + 9325 4 5? + 37005 2 5 3 5? - 

1975 s 5? + 14085?5i - 36l5?5i - 62 95 2 5 4 5i + 225 6 5i - 4275?5 3 + 

565 3 5 4 + 425 2 5,5 — 5 7 , 

5 8 = 50405? - 352 805 2 5? + 222125 3 5? + 61416515? - 90805 4 5? - 

475005 2 5 3 5? + 23115 5 5? - 26920515? + 62275?5? + 108995 2 5 4 5? - 
3515 6 5? + 146135|5 3 5i - 17435 3 5 4 5i - 13175 2 5 5 5 i + 295 7 5 4 + 14085? - 
12 1 55 2 5f + 565? - 10565?5 4 + 985 3 5 5 + 645 2 5 6 - 5 8 , 

5 9 = —403205? + 3225605 2 5? - 2129765 3 5? - 7030085?5? + 948525 4 5? + 

6138925 2 5 3 5? - 275685 s 5? + 4610245|5f - 973165?5? - 1710125 2 5 4 5? + 
51195 6 5? - 3401645?5 3 5? + 372975 3 5 4 5? + 283685 2 5 5 5? - 5835 7 5? - 
651045?5i + 514005 2 5?5 i - 21915?5i + 448595?5 4 5 i - 38445 3 5 5 5i - 
253 1 5 2 5 6 5i + 375 8 5 4 - 12155? + 202455?5 3 - 62855 2 5 3 5 4 - 
23735?5'5 + 2 1 05 4 55 + 1625 3 5g + 935 2 5 7 — Sg, 

5io = 3628805?° - 32659205 2 5? + 22393445 3 5? + 85849925?5? - 10666445 4 5? - 
820 89005 2 5 3 5? + 3429645 5 5? - 76642565?5? + 14897365?5? + 
26272605 2 5 4 5? - 736395 6 5? + 69001165?5 3 5? - 7013175 3 5 4 5? - 
5365965 2 5 5 5? + 1036 65 7 5? + 19690085?5? - 14348765 2 5?5? + 

570165?5? - 12569315?5 4 5? + 1002615 3 5 5 5? + 665045 2 5 6 5? - 9165 8 5? + 
623355?5i - 11229495?5 3 5i + 3236775 2 5 3 5 4 5 4 + 1229525?5 5 5 i - 
101165 4 5 5 5i - 78335 3 5 6 5i - 453 45 2 5 7 5i + 465 9 5i - 651045? + 

1121355?5? - 84765 2 5? + 2105? + 651045?5 4 - 99305?5 4 - 148755 2 5 3 5 5 - 
49045?56 + 3725 4 5 6 + 25 55 3 5 7 + 1305 2 5 8 - 5i 0 . 


( 41 ) 
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A.3 Affine Coordinates and the Antipode 


There is, of course, a deeper reason for the introduction of the affine coordinates 8 n (ip). To 
keep the paper more accessible, however, only a small justification is provided for them by relating 
the antipode formulae (39) to the affine coordinates of the inverse of the diffeomorphism. First 
write the first few coordinates explicitly: 


= ¥>"(0), (42) 

S 2 (<p) = V'(0) 2 + ^ (3) (0), (43) 

S 3 (ip) = 2<p"{0) 3 -3ipW{0)tp"{0) + <p( 4 \0). (44) 


Moreover, by definition, 8 n (<p 3 ) = S(8 n )(tp). To see how this compares to S(8 n ) introduced 
before, first observe that 


Sn^- 1 ) = (log(^” 1 )') (n) (0) 

/ 1 \ W 

= (log —- _1 (0)- (45) 

V <f ° <P J 

Using this, one obtains 

= V'(0), (46) 

= 2^"(0) 2 - <^ (3) (0), (47) 

4' 3 (<^~ 1 ) = -8^"(0) 3 + 7^ 3 )(0)(^"(0) -<^ (4) (0). (48) 

It is an easy exercise to see that 

<5i(<£ -1 ) = ~6i(<p), (49) 

^2(<^ _1 ) = -6 2 (<p) + 8 i(</?) 2 , (50) 

4'3(<^ _1 ) = -foiv) + 4di(<p)d 2 (<p) - 2di((/?) 3 , (51) 


which is in accordance with (13) to (15). 
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